A Supervised Abbreviation Resolution System for Medical Text

نویسندگان

  • Pierre Zweigenbaum
  • Louise Deléger
  • Thomas Lavergne
  • Aurélie Névéol
  • Andreea Bodnari
چکیده

We present our participation in Task 2 of the 2013 CLEFeHEALTH Challenge, whose goal was to determine the UMLS concept unique identifier (CUI), if available, of an abbreviation or acronym. We hypothesize that considering only the abbreviations of the training corpus could be sufficient to provide a strong baseline for this task. We therefore test how a fully supervised approach, which predicts the CUI of an abbreviation based only on the abbreviations and CUIs seen in the training corpus, can fare on this task. We adapt to this task the processing pipeline we developed for CLEF-eHEALTH Task 1, entity detection: a supervised MaxEnt model based on a set of features including UMLS Concept Unique Identifiers, complemented here with a rule-based component for document headers. This system confirmed our hypothesis, and was evaluated at 0.664 accuracy (strict) and 0.672 accuracy (relaxed), ranking second out of five teams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts

Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is criti...

متن کامل

Towards a Learning Approach for Abbreviation Detection and Resolution

The explosion of biomedical literature and with it the -uncontrolledcreation of abbreviations presents some special challenges for both human readers and computer applications. We developed an annotated corpus of Dutch medical text, and experimented with two approaches to abbreviation detection and resolution. Our corpus is composed of abstracts from two medical journals from the Low Countries ...

متن کامل

Statistical-Based Abbreviation Expansion

The work presented in this paper deals with the text normalization for highly inflectional languages. This paper is focused on abbreviation expansion and likewise on numerals normalization. Our text normalization system does not use any explicit parser or part-of-speech tagger and thus it can be called lightly supervised. The standard rule-based text normalization method is compared with the pr...

متن کامل

A Proposed System to Identify and Extract Abbreviation Definitions in Spanish Biomedical Texts for the Biomedical Abbreviation Recognition and Resolution (BARR) 2017

Biomedical Abbreviation Recognition and Resolution (BARR) is an evaluation track of the 2nd Human Language Technologies for Iberian languages (IberEval) workshop, which is a workshop series organized by the Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN). In this first edition of BARR, the focus is on the discovery of biomedical entities and abbreviation, and relating detected ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013